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Abstract 

We describe a new direct method for estimating structure and motion from image intensities of multiple 
views. We extend the direct methods of [7] to three views. Adding the third view enables us to solve for 
motion, and compute a dense depth map of the scene, directly from image spatio-temporal derivatives in 
a linear manner without first having to find point correspondences or compute optical flow. 
We describe the advantages and limitations of this method which are then verified through simulation and 
experiments with real images. 
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1 Introduction 

In this paper we present a new method for comput- 
ing motion and dense structure from three views. This 
method can be viewed as an extension of the 'direct 
methods' of Horn and Weldon [7] from two views (one 
motion) to three views (two motions). These methods 
are dubbed 'direct methods' because they do not require 
prior computation of optical flow. As with other gra- 
dient methods we assume small image motions on the 
order of a few pixels. 

Applying the constant brightness constraint [6] to the 
trilinear tensor of Shashua and Werman [12, 15] results 
in an equation relating camera motion and calibration 
parameters to the image gradients (first order only). We 
get one equation for each point in the image and we 
have a fixed number of parameters which results in a 
highly over-constrained set of equations. Starting with 
the general uncalibrated model we proceed through a 
hierarchy of reduced models first by assuming calibrated 
cameras and then by assuming the Longuett-Higgins and 
Prazdny small motion model [9]. This is described in 
section (2). We then show how to solve the simplified 
model for the motion parameters (section 3). 

This method has advantages over both optical flow 
methods [9] [10] and feature based methods [15]. We 
combine the information from all the points in the image 
and thus we avoid the aperture problem which makes 
computation of optical flow difficult. There is also no 
need to explicitly define feature points. Points which 
have little or no gradient simply contribute little to the 
least squares estimation. Information from all points 
that have gradients is used. 

These advantages are highlighted in a scene where we 
have a set of vertical bars in front of a set of horizon- 
tal bars and behind everything a uniform background. 
In this case optical flow methods will fail because of 
the straight bars and the aperture problem. Point fea- 
ture based methods fail because the intersections of the 
lines in the image, which will be detected as 'features' do 
not correspond to real features in space. Many natural 
scenes such as tree branches or man made objects such 
as window frames, lamp posts and fences often give rise 
to these problems. 

Section (4) describes some of the implementation de- 
tails needed to make the method work. Sections (5) and 
(6) show the results of applying this method to simu- 
lated and real images. The method is shown to produce 
useful results in both camera motion and depth estima- 
tion. Section (7) discusses some of the open questions 
and suggests possible solutions. 

1.1 Previous Work 

The 'direct methods' were pioneered by Horn and Wel- 
don in [7]. Using only a single image pair one has A 
equations in A + 6 unknowns, where A is the number of 
points in the image, so some added constraint is needed. 
Negahdaripour and Horn [11] present a closed form solu- 
tion assuming a planar or quadratic surface. McQuirk [8] 
shows that, assuming a pure translation model, the sub- 
set of the image points with a nonzero spatial derivative 
but a zero time derivative gives the direction of motion. 



The FOE is on a line perpendicular to the gradient at 
these points. But by using only this subset of the points 
we are throwing away the information from most of the 
image. Heel [5] uses multiple images from an image se- 
quence. He then employs Kalman filters to build up a 
structure model from more than one image pair but the 
core computation is fundamentally the same single image 
pair computation. 

This work is based on the work of Shashua and Hanna 
[14]. Here we describe the results of implementing these 
ideas in practice. During the course of implementation 
various subtleties and limitations were discovered. 

2 Mathematical Background 

2.1 Notations 

The pin-hole camera model in homogeneous coordinates 
is represented by a 3 x 4 camera matrix A producing the 
following relation: p = Ax where x varies in 3D space, p 
varies over the 2D image plane, and = denotes equality 
up to scale. Since only relative camera positioning can 
be recovered from image measurements, the first camera 
matrix (camera 0) can be represented by [7;0]. In a 
pair of views, p = [I; 0]x and p' = Ax, the left 3x3 
minor of A stands for a 2D projective transformation of 
the chosen plane at infinity and the fourth column of 
A stands for the epipole (the projection of the center of 
camera on the image plane of camera 1). In particular, 
in a calibrated setting the 2D projective transformation 
is the rotational component of camera motion and the 
epipole is the translational component of camera motion. 
We will occasionally use tensorial notations, which 
are briefly described next. We use the covariant- 
contravariant summation convention: a point is an ob- 
ject whose coordinates are specified with superscripts, 
i.e., p % = (p 1 ,p 2 , •••). These are called contravariant vec- 
tors. An element in the dual space (representing hyper- 
planes — lines in V 2 ), is called a covariant vector and 
is represented by subscripts, i.e., Sj = (si, S2, ....). In- 
dices repeated in covariant and contravariant forms are 
summed over, i.e., p % S( = p l s\ + p 2 S'j + ... + p n s n . This 
is known as a contraction. For example, if p and s rep- 
resent a coincident point and line in V 2 , then p % Si = 0. 
Vectors are also called 1- valence tensors. 2- valence ten- 
sors (matrices) have two indices and the transformation 
they represent depends on the covariant-contravariant 
positioning of the indices. For example, a\ is a mapping 
from points to points, and hyperplanes to hyperplanes, 
because a\p % = q : and ajsj = r 8 - (in matrix form: Ap = q 
and A T s = r); a 8 j maps points to hyperplanes; and a 8J 
maps hyperplanes to points. When viewed as a matrix 
the row and column positions are determined accord- 
ingly: in a\ and aji the index i runs over the columns 

and j runs over the rows, thus b;a\ = c^' is BA = C in 
matrix form. An outer-product of two 1-valence tensors 
(vectors), a 8 & J , is a 2-valence tensor c] whose i, j entries 
are a 8 & J — note that in matrix form C = ba T . 

Matching image points across three views will be de- 
noted by p,p',p"; the coordinates will be referred to as 
p l ,p'i ,p" k , or alternatively as non-homogeneous image 



coordinates (x, y), (x' , y'), (x" , y"). 

2.2 Stratification of Direct Motion Models 

Three views, p = [l;0]x,p' = Ax and p" = Bx, are 
known to produce four trilinear forms whose coefficients 
are arranged in a tensor representing a bilinear function 
of the camera matrices A, B: 



J k 
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(1) 

where A = [a\ , t' : ] (aj is the 3x3 left minor and t' is the 
fourth column of A) and B = [bf , t" k ] . The tensor acts 
on a triplet of matching points in the following way: 

(2) 
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where s^ are any two lines {s l - and s 2 ) intersecting at p' , 
and r p k are any two lines intersecting p" . Since the free 
indices are p, p each in the range 1,2, we have 4 trilinear 
equations (which are unique up to linear combinations). 
More details can be found in [4, 12, 15, 13]. 

Geometrically, a trilinear matching constraint is pro- 
duced by contracting the tensor with the point p of im- 
age 0, some line coincident with p' in image 1, and some 
line coincident with p" in image 2. In particular, we 
may use the tangent to the iso-brightness contour at p' 
and p" , respectively. In other words, one can recover in 
principle the camera matrices across three views in the 
context of the "aperture" problem, as noticed by [16]. 
Alternatively, if we represent the tangent to the corre- 
sponding iso-brightness contours by the instantaneous 
spatio-temporal derivatives at p (shown next), we will 
get a constraint equation involving the unknowns a\ 
and the spatio-temporal derivatives at each pixel — the 
constraint is linear in the unknowns. This constraint was 
introduced by [14] under the name "Tensor Brightness 
Constraint". This is briefly derived next. 

We can describe any line, in particular a line through 
p' , as a linear combination of the vertical (1, 0, —x') and 
horizontal (0,1,— y') lines. Let the coefficients of the 
linear combination be the components of the image gra- 
dient I x , I y at (x, y) in image 0, then the line s' has the 
form: 

f h \ 
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\ % ix y ly / 
The contribution of x',y' can be removed by using the 
constant brightness equation due to [6]: 

u'I x + v'ly +11 = (3) 

where u' = x — x' , v' = y — y' and I' t is the discrete 
temporal derivative at (x,y), i.e., Ii(x, y)—Io(x, y) where 
I\ and 7o are the image intensity values of the second 
and first images, respectively. Following substitution we 
obtain, 

/ h \ 
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(4) 



Likewise, the tangent to the iso-brightness contour at p" 



S" 



I x 



(5) 



where 7" is the temporal derivative between images 
and 2. The tensor brightness constraint is therefore: 
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(6) 



We have one such equation for each point on the im- 
age where s'l and s'- can be computed from the image 
gradients and p = (x,y, 1) T are the (projective) image 
coordinates of the point in image 0. We wish to solve for 
a\ which combines the motion and camera parameters 
Starting from the general model (27 parameter model) 
of the constraint equation one can introduce a hierarchy 
of reduced models, as follows. By enforcing small-angle 
rotation on the camera motions, i.e., A = [I + [w'] x ;t'] 
and B = [I + [w"] x ;t"] where w',w" are the angular 
velocity vectors and [-] x is the skew-symmetric matrix 
of vector products, the tensor brightness constraint is 
reduced to a 24-parameter model which in matrix form 
looks like: 

I?S' T t' - I^S" T t" + S' T [t'w" T ]V" - S" T [t"w' T }V = 0, 

(7) 
where 

/ -h + y( I 't ~ xI x -yi y ) \ 

V'=pxS'=i I x - x{I[ - xl x - yiy) 

\ xl y - yl x J 

and 

/ -Iy + y(Ii'-xI x -yI y ) \ 

V"=pxS"=i I x - x{I' t ' - xl x - yiy) 

V xly - yl x J 

If, in addition, we enforce infinitesimal translational 
motion (the Longuett-Higgins & Prazdny [9] motion 
model), which results in the image motion equations: 



u' = —(t'i — xt' 3 ) — w' 3 y + w' 2 (l + x ) — w[xy 

v ' = -(4 - 2^3) + w 3 x - w 'i( l + y' 2 ) + w 2 x y 

then S has the simpler form: 



(8) 



and 



S=[ Iy ) 

\ -xl x - yiy ) 
-Iy - y(xl x +yl y ) 



(9) 



V = I x + x(xl x + yl y ) I (10) 

V xly - yl x J 

and we obtain a 15-parameter model of the following 
form: 



I?S T t'-I^S T t" + S T [t 
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t"w ll ]V = (11) 



We now have one such equation for each point in the 
image. It is a set of bilinear equations in the unknowns 
t' , t" , w' , w". After solving for the camera motions (to be 
described later in the paper) we can solve for the dense 
depth map from the equations: 

KS T t' + V T w' + I' t = (12) 

KS T t" + V T w" + I' t ' = (13) 

where K = - denotes inverse of the depth at each pixel 
location. Equations (12)(13) are obtained by substitut- 
ing (eq. 8) in equation (3) and rearranging the terms. 
See [7] for more details. 



3 Solving the bilinear equation 

3.1 The pure translation case 

In the pure translation case equation (11) becomes: 

I' t 'S T t' - I[S T t" = (14) 

We have one such equation for each image point and we 
can write it out in the matrix form: 



At = 



(15) 



where: 



t = (t' x r y r z ?> t<< f z y (i6) 

and A is an N x 6 matrix with the i'th row (corresponding 
to the i'th pixel) given by: 

( If Sil I t S{2 If S{3 I t Sn I t S{2 IfSi3 ) (17) 

Since we wish to avoid the trivial solution t = we will 
add the constraint ||t|| = 1. The least squares problem 
now maps to problem of finding ||t|| = 1 that minimizes: 

t T A T At = (18) 

The solution is the eigenvector of A T A corresponding to 
the smallest eigenvalue. 

3.2 The general small motion case 

In the general case we are confronted with the bilin- 
ear equation (11). There is no standard way to solve 
these bilinear problems and we have chosen to treat the 
6 translation parameters and the 9 outer product terms 
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t"w ] as 15 intermediate parameters. Although 
they are not independent if we act as if they are, then 
they can be solved for as in section (3.1) but with a 
15x15 matrix A. After recovering the 15 intermediate 
parameters we compute w' and w" from [t'w" T — t"w' T ] 
using the computed translation. 

This is the general idea but it is not that simple. If we 
consider the N x 15 matrix A we note that the columns 
of A are not independent. 

Lemma 1 The matrix A is of rank < 13. 

Proof: The 7th, 11th and 15th elements along each row 
add up to zero. 

A(i,7) + A(i,ll)+A(i,15) = 
= SuVii + Si-jVi'j + SisVis 

= s-v 

= S-Px S 

= 

Therefore the vector 

c = (0,0, 0,0, 0,0, 1,0, 0,0, 1,0, 0,0, if (19) 

is in the null space of A. The correct solution vector is 
also in the null space of A. Therefore the null space of 
A has rank > 2 and A is of rank < 13. ■ 

The matrix A is in theory of rank 13 not 14. Suppose 
we found a vector &o in the null space of A such that 
Co x to / 0. The two vectors Co and &o span the null 
space of A and the desired solution vector & is a linear 
combination of the two: 



In order to find a given Co and &o we must apply some 
constraint. We choose to enforce the constraint that the 
matrix [t'w" T - t"w' T ] be of rank 2. 

Clearly the choice of a will have no affect the first 
6 elements of the vector b. Let us arrange the last 9 
elements of b, &o and Co into the corresponding 3x3 
matrices B, Bo and Co- We are now looking for an a 
such that: 

Rank (B - aC ) = 2 (21) 

Since Co in our case is the identity matrix, the solution 
for a is given by the eigenvalues of Bo- We choose the 
one with the smallest absolute value. The vector &o is the 
eigenvector corresponding to the second smallest eigen- 
value of the matrix A T A. (The vector Co corresponds to 
the smallest.) 

Based on the previous arguments the algorithm for 
finding the motion parameters is as follows: 

1. Compute the N x 15 matrix A from equation (11). 

2. Find the eigenvector corresponding to the second 
smallest eigenvalue of the matrix A T A. This is bo- 

3. The first 6 elements of &o are the two translations, 
t' and t". 

4. Arrange the other 9 elements of &o into a 3 x 3 
matrix Bo ■ 

5. Construct a rank 2 matrix B from Bq: 



B =Bo-aI 



(22) 



b = &o + «co 



(20) 



6. Solve: 

[ t ' w "T _ t 'V T ] = B (23) 

for w' and w" , given t' and t" from step (3). 

3.3 The singular case: 

This method fails when the two motions are in the same 
(or opposite) directions. This is obvious in the pure 
translation case. If the second translation vector t" is 
proportional to the translation vector t' then equation 
(13) is simply a scaled version of equation (12) adding 
no new information. Under current research is a way to 
overcome this problem (section 7) and initial results look 
promising. 

4 Implementation details 

4.1 Iterative refinement and coarse to fine 
processing 

The constant brightness constraint is a linearized form of 
the Sum Square Difference (SSD) criteria. The linear so- 
lution can be thought of as a single iteration of Newton's 
method applied to the problem. Iterative refinement is 
performed as follows: First one calculates motion and 
depth using the above equations. Then, using the depth 
and motion, images 1 and 2 are warped towards image 
0. A correction to the depth and motion is computed 
using the warped images. In the ideal case, as the final 
result the warped images should appear nearly identical 
to image 0. 

We have to be careful here. If at every iteration we 
compute only the correction to the motion we will end up 
trying to compute very small values of St. At this point 



our equations will be badly conditioned and highly af- 
fected by noise. The idea is to devise our iterations in 
such a way as to always be computing a gradually im- 
proving estimate of the translation t rather than the in- 
cremental improvements, but using temporal derivatives 
from closer and closer images. 

Let \?o, ^1 and 'J 2 be the three image. Assume we 
have K, t J , w J , from the previous iteration. The image 
motions it? and v : can be computed using K, t : and w : 
in equation (8). These are then used to warp images \?i 

to \?i and ^2 to \?2. After warping, the images satisfy 
the brightness constraint equation: 
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Figure 1: The texture image (a) and simulated depth 
map(b) used for simulation experiments. 



I x du' + Iydv' + I' t = 
I x du" + I y dv" + I'{ = 



(24) 



where the temporal derivatives at each pixel are given 
by: 






(25) 



and du : , dv : are the (still unknown) differences between 
computed image motions and the real image motions: 



Let: 



du J = u J — v? 
dv- 1 = i> J — v- 1 



a 3 = I x v? + IyV 3 



which can also be written as: 

a j = KS T P + V T w j 



(26) 



(27) 



(28) 



Substituting equations (27) and (26) in equation (24) we 
get: 



I x u + I y v + (I' t - a') = 



(29) 



Substituting equation (8) in equation (29) we get modi- 
fied versions of the equations (12) and (13) 



KS T t' + V T w' + (I' t - a') = 
KS T t" + V T w" + (![' - a") = 



(30) 



We start our first iteration with K, t J , w J all zero and 
therefore a = as well. 

In order to deal with image motions larger than 1 pixel 
we use a Gaussian pyramid for coarse to fine processing 

[2] [3]. 

4.2 Computing the depth, smoothing and 
interpolation. 

After recovering the camera motions, equations (12) and 
(13) give us a way of computing the depth at every point 
where S T t' or S T t" are non zero. In order to combine 
the information from both images and to interpolate over 
areas where the image gradients are small we chose an 
interpolation scheme called Local Weighted Regression. 
This method was chosen because it is simple to imple- 
ment but could very well be replaced by other methods 
such as RBF's, B-splines or thin-plate interpolation. 



Equation (31) shows the cost function used to com- 
pute the depth at a given point: 



mmarg K ^ /_• 
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P(x, y)\S 2 V \ p [ KS 1 V + V 1 w 1 + I\ 

( 31 ) 
The sum is over a region R and over the two motions 

j = 1, . . . , 2. The windowing function /3(x, y) allows one 
to increase the weight of the closer points. We used a 
function created by convolving two box filters together. 
It is a crude approximation to a Gaussian. The |5' T t J | p 
term reduces the weight of points which have a small gra- 
dient or where the gradient is perpendicular to that cam- 
era motion since these cases are highly affected by noise. 
We used p = 1. The size of the region R depends on the 
amount of smoothing and interpolation required. During 
the iteration process we used a region of 5 x 5 or 7 x 7. 
In order to get 'prettier' results, after the last iteration, 
we replaced K which is a locally constant depth model 
with a locally planar model K(x, y) = K x x + K y y + A'o 
and increased the region of support to 30 x 30. 

5 Simulation 

To test the method we took the 320 x 240 texture image 
(fig. la) and defined the depth at each point in the image 
according to the equation: 

Z(x, y) = 1000 + 400 * (sin(^) + sin(^)) (32) 

The depth map is shown in figure (lb). We then warped 
it according to two given motions and equation (8). We 
used the original image and the two warped images as 
input to our algorithm and computed the motion and 
depth. The motion was chosen so that the image motion 
was 8 pixels or less. The first image was warped by a 
translation oft' = (2000,0,500) and no rotation. The 
second image was warped by a rotation w" = (0.5, 0, 0) 
and a translation t" = (0,2000,0). The rotation and 
translation were chosen so that they combined to reduce 
the overall motion. The focal length was / = 50. 

We used 4 levels of coarse to fine processing and 2 
iterations at each level. Figure (2a) shows the recon- 
structed depth map at the finest level. Figure (2b, 2c) 
shows the 3D rendering of the surface using a local con- 
stant depth fit and a local planar fit respectively. Apart 
from a few outliers around the edges the surface is qual- 
itatively correct. Table (1) shows the given motion and 




Table 1: Estimated camera motion in simulation exper- 
iments 



(c) Local planar lit. 

Figure 2: The estimated K(x,y) = - 
from simulation. 



Translation 

FOE 1: 
FOE 2: 


real 
(360, 120 ) 
(160, inf) 


estimate 
(369.4, 119.8) 
(170.8, 6097) 


Rotation 

Wl 
W2 


real 

(0,0,0) 

(-0.5,0,0) 


estimate 

(-0.0018, 0.00034,0.0012) 

(-0.48, 0.0055, 0.0096) 



Table 2: Image coordinates of two example points in the 
three images 



Head 
Cork 



Image 

(255,206) 
(72,291) 



Image 1 
(254,195) 
(70,283) 



Image 2 
(264,206) 
(78,291) 



(inverse depth) 



the resulting motion estimates. The results are shown 
using a 7 x 7 region for depth estimation. Varying the 
size of the region from 3x3 to 11 x 11 had less than 1% 
effect on the motion estimation. 

6 Experiments with real images 

6.1 Experimental procedure 

The images were taken with a Phillips ^inch CCD video 
camera and an 8rara lens. Image capture was performed 
using the SGI Indy built in frame grabber at 640 x 480 
pixel resolution. Figure (3a) shows one of three images 
used for the experiment. The depth in the image ranged 
from 450mm to 750mm. The camera was mounted on 
a lightweight tripod. After taking the first image the 
camera was lowered 3mm and a second image was taken. 
The tripod was then moved a few millimeters to the right 
for the third image. No special care was taken to ensure 
precise motion. Image motions were 6 — llpixels. Table 
(2) shows the measured image coordinates for two points 
in the three images: one on the head and one on the cork 
panel on the image left. 

6.2 Results 

The results are shown for the case where images were 
processed using 4 levels of coarse to fine processing with 
2 iterations at each level. Varying the number of iter- 
ations from 1 through 4 had no qualitative impact but 
using a single iteration caused a small change in the re- 
sulting motion estimates. A 7 x 7 region was used for the 
local constant depth fit at all levels. Table (3) shows the 
estimated camera motions. It shows that the first mo- 
tion was along the Y axis and the second motion along 
the X axis and that for both cases the rotation was neg- 
ligible. This is qualitatively correct. We do not have 
accurate ground truth estimates. 

Figure (3b) shows the recovered depth map (in fact 
this shows K(x,y) = -g-). Figure (3c) shows a 3D ren- 
dering of the surface K(x,y). The K values have been 
scaled by 100.0. The rendering uses an orthographic 
projection. In order to get smoother and more visually 



Table 3: Motion estimates from real images. 



FOE 1 
FOE 2 


(-603.7, -8055) 
(16302, 300.2) 


Wl 
W2 


(0.00022,-0.00055,-0.0217) 
(-0.00027-0.00017-0.00062 ) 



pleasing results a local planar fit was used for the final 
stage using a 30 x 30 region of support. The results are 
shown in figures (4a, 4b). There is noticeable smoothing 
and overshoots at depth discontinuities and the tip of 
the nose. In (4b) the texture was removed for clarity. 

7 Discussion and future work 

We have presented a new method for recovering struc- 
ture and motion from 3 views. This method does not 
require feature correspondence or optical flow. We have 
shown, using simulation and experiments with real im- 
ages, that the method can qualitatively recover depth 
and motion in the general, small motion case. These re- 
sults are promising but more experiments are needed to 
test the accuracy of the motion estimation. 

Occlusions do not pose a significant problem in the 
motion estimation since they take up only a small part 
of the total image area. The depth estimates obtained 
along the occlusion boundary will be inaccurate. 

In general one can only obtain a depth estimate where 
there is a gradient. In order to get a dense depth map 
one must perform some form of interpolation. The local 
constant or linear fit which we have used is superficially 
similar to the smoothness assumptions employed by op- 
tical flow techniques to overcome the aperture problem. 
We could benefit from some details of these techniques 
such as adaptive window size. The main difference is 
that here, the smoothness does not affect our motion es- 
timation and is for cosmetic reasons, while for optical 
flow methods the smoothness assumption is key to the 
whole process. Another difference is that we are smooth- 
ing the depth image for which it might make sense to 
assume a local planar or quadratic model. This is very 
much application dependent. 

There are two important theoretical questions left 
open. The first involves solving the bilinear equation 
(11). We have chosen to enforce a rank 2 constraint 
on the matrix [t'w" T — t"w' T ] but there might be other 
possible constraints to use. Furthermore, we make the 
matrix rank 2 by subtracting al where a is the eigen- 
value closest to zero. But perhaps we should use one 
of the other two eigenvalues. On the practical side, it 
might be best to solve equation (11) as a nonlinear opti- 
mization problem with the linear solution as the initial 
guess. 

The second open problem deals with the current limi- 
tation that the two motions cannot be collinear since this 
has been shown (section 3.3) to be a singular case. On 
the other hand no such limitation exists in the discrete 
case [15]. There is a question whether this is a general 
phenomena resulting from using the constant brightness 
equation or whether this is specific to the LH model. 
There are two avenues to proceed. We can go back to 




Figure 3: One of the three input images (a) and the 
estimated K(x,y) = 4 inverse depth map (b) and 3D 
rendering of the surface with K scaled by 100.0 (c). Uses 
7x7 region and a local constant depth model. 




Figure 4: 3D rendering of the estimated surface 
K(x,y) = 2 (inverse depth) scaled by 100.0. Uses 
30 x 30 region and a locally planar depth model. Note 
the overshoot along the depth discontinuities around the 
head. 



more complex motion models described in section (2), 
the 24 parameter model or even the full 27 parameter 
'Tensor Brightness Constraint' equation. 

Alternatively one can look again at the 'Constant 
Brightness Equation' (3). In our implementation we cal- 
culate I x and I y at image 0. I x and I y and therefore the 
vector S are the same for both image pairs. If instead, 
the derivatives were computed at image 1 and image 2 
the equations (12) and (13) would give us independent 
equations even in the collinear motion case. 
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